Dockerfile (instructions used to assemble a full operating environment) and github workflow file (instructions for building and packaging the docker image on github via actions).
This document comprises the manual for the Darwin Harbour Water Quality monitoring program analysis application. It provides information on:
R Graphical and Statistical Environment offers an ideal platform for developing and running complex statistical analyses as well as presenting the outcomes via professional graphical/tabular representations. As a completely scripted language it also offers the potential for both full transparency and reproducibility. Nevertheless, as the language, and more specifically the extension packages are community developed and maintained, the environment evolves over time. Similarly, the underlying operating systems and programs on which R and its extension packages depend (hereafter referred to as the operating environment) also change over time. Consequently, the stability and reproducibility of R codes have a tendency to change over time.
One way to attempt to future proof a codebase that must be run upon a potentially unpredictable operating environment is to containerise the operating environment, such that it is preserved to remain unchanged over time. Containers (specifically docker containers) are lightweight abstraction units that encapsulate applications and their dependencies within standardized, self-contained execution environments. Leveraging containerization technology, they package application code, runtime, libraries, and system tools into isolated units (containers) that abstract away underlying infrastructure differences, enabling consistent and predictable execution across diverse computing platforms.
Containers offer several advantages, such as efficient resource utilization, rapid deployment, and scalability. They enable developers to build, test, and deploy applications with greater speed and flexibility. Docker containers have become a fundamental building block in modern software development, enabling the development and deployment of applications in a consistent and predictable manner across various environments.
Shiny is a web application framework for R that enables the creation of interactive and data-driven web applications directly from R scripts. Developed by Rstudio, Shiny simplifies the process of turning analyses into interactive web-based tools without the need for extensive web development expertise.
What makes Shiny particularly valuable is its seamless integration with R, allowing statisticians and data scientists to build and deploy bespoke statistical applications, thereby making data visualization, exploration, and analysis accessible to a broader audience. With its interactive and user-friendly nature, Shiny serves as a powerful tool for sharing insights and engaging stakeholders in a more intuitive and visual manner. ## Git and github
Git, a distributed version control system, and GitHub, a web-based platform for hosting and collaborating on Git repositories, play pivotal roles in enhancing reproducibility and transparency in software development. By tracking changes in source code and providing a centralized platform for collaborative work, Git and GitHub enable developers to maintain a detailed history of code alterations. This history serves as a valuable asset for ensuring the reproducibility of software projects, allowing users to trace and replicate specific versions of the codebase.
GitHub Actions (an integrated workflow automation feature of GitHub), automates tasks such as building, testing, and deploying applications and artifacts. Notably, through workflow actions, GitHub Actions can build docker containers and act as a container registry. This integration enhances the overall transparency of software development workflows, making it easier to share, understand, and reproduce projects collaboratively.
Figure 1 provides a schematic overview of the relationship between the code produced by the developer, the Github cloud repositiory and container registry and the shiny docker container run by user.
Dockerfile (instructions used to assemble a full operating environment) and github workflow file (instructions for building and packaging the docker image on github via actions).
To retrieve and run docker containers requires the installation of Docker Desktop on Windows and MacOSx
The steps for installing Docker Desktop are:
Download the Installer: head to https://docs.docker.com/desktop/install/windows-install/ and follow the instructions for downloading the appropriate installer for your Windows version (Home or Pro).
Run the Installer: double-click the downloaded file and follow the on-screen instructions from the installation wizard. Accept the license agreement and choose your preferred installation location.
Configure Resources (Optional): Docker Desktop might suggest allocating some system resources like CPU and memory. These settings can be adjusted later, so feel free to use the defaults for now.
Start the Docker Engine: once installed, click the “Start Docker Desktop” button. You may see a notification in the taskbar - click it to confirm and allow Docker to run in the background.
Verification: open a terminal (or Powershell) and run docker --version. If all went well, you should see information about the installed Docker Engine version.
Additional Tips:
The task of installing and running the app is performed via a single deploy script (deploy_wq.bat on Windows or deploy_wq.sh on Linux/MacOSX/wsl). For this to work properly, the deploy script should be placed in a folder along with two additional folders (one called input and the other called parameters) that contains the input datasets (in csv format) and run time parameters. This structure is illustrated below for Windows.
\
|- deploy_wq.bat
|- input
|- 16_wq.csv
|- 17_wq.csv
|- overwrites.csv
|- weights_m.csv
|- weights_s.csv
|- parameters
|- config.ini
|- water_quality_guidelines.csv
|- spatial.csv
|- GIS
|- RCZ_rev24.*
|- SBZone_upper.*
|- Middle_Harbour_Upper.*
In the above illustration, there are two example water quality datasets (16_wq.csv and 17_wq.csv). To ensure that the application correctly identifies them as water quality datasets, it is important that they are named according to the following format: <yy>_wq.csv where the <yy> represents a two digit year (e.g. 16 for 2016). For additional information on the contents of these files, please see Section 6.1.7.1.
To set up the above structure:
create a new folder on your computer in a location of your choice that you are likely to remember and easily locate (e.g. on the desktop). Whilst the name of the folder is not important, it is recommended that it be named after the project (e.g. darwin_harbour_wq_monitoring).
download the deploy script from the projects github repository
go to the projects github repository (https://github.com/open-AIMS/dh_wq_monitoring.git) in a browser
click on either the deploy_wq.bat (Windows) or deploy_wq.sh (Linux/MacOSX/wsl).
click on the download button and select the project folder as the location to download the file to. If the file is automatically downloaded to a downloads folder, move the file to the project folder.
within the project folder, create folders called inputs and parameters (as outlined above) and place all the appropriate data sets into these folders
To run the app, navigate inside of the project folder and run (typically double click) on the deploy script. Upon doing so, you will be presented with a directory selection window that is prompting for the path of the project folder. Navigate to and select the project folder before clicking the “OK” button. Shortly thereafter, the application will appear in a browser tab.
deploy_wq.bat script
The deploy_wq.bat script performs the following:
docker is installed and available from the command line for the current userdocker is runningdocker image and if so pull them downdocker containerThis Shiny application is designed to ingest very specifically structured water quality datasets containing Darwin Harbour Water Quality monitoring data and produce various analyses and visualisations. The application is served from a docker container to the localhost and the default web browser.
Docker containers can be thought of a computers running within other computers. More specifically, a container runs an instance of image built using a series of specific instructions that govern the entire software environment. As a result, containers run from the same image will operate (virtually) identically regardless of the host environment. Furthermore, since the build instructions can specify exact versions of all software components, containers provide a way of maximising the chances that an application will continue to run as designed into the future despite changes to operating environments and dependencies.
This shiny application comprises five pages (each accessable via the sidebar menu on the left side of the screen):
Each page will also contain instructions to help guide you through using or interpreting the information. In some cases, this will take the from of an info box (such as the current box). In other cases, it will take the form of little symbols whose content is revealed with a mouse hover.
There are numerous stages throughout the analysis pipeline that may require user review (for example examining any data validation issues as well as the QAQC figures to confirm that the data are as expected). Consequently, it is advisable for the user to manually trigger each successive stage of the pipeline. The stages are:
Stage 1 - Prepare environment
More info
This stage is run automatically on startup and essentially sets up the operating environment.
../params/config.ini. These include:
focal_year: usually the final year of sampling, all artifacts (data/graphics) will be stored in a folder reflecting this yearmethod: the index method to apply when calculating indicesfoldcap: the folding cap to apply when calculating indicestuning: the tuning to apply when calculating indicessize: the number of bootstrapp samplesseed: the random seed to apply to bootstrappingStage 2 - Obtain data
More info
This stage comprises of the following steps:
../parameters/water_quality_guidelines.csv.../input/. These files are in the format of <number>_wq.csv, where <number> is a two digit number representation of the sampling year.../input/overwrites.csv.../input/weights_m.csv.../input/weights_s.csv.../input/hierarchy.csv.../parameters/spatial.csv.The tables within the Raw data tab of the Data page will also be populated (but wont be available for review until after the data have been processed in Stage 3).
Stage 3 - Prepare spatial data
More info
This stage comprises of the following steps:
../parameters/GIS. The files are:
RCZ_rev24.shpSBZone_upper.shpMiddle_Harbour_Upper.shpThe tables within the Processed data tab of the Data page will also be populated.
Stage 4 - Process data
More info
This stage comprises of the following steps:
../parameters/config.ini or the datafocal_year is undefined, define it based on the maximum dateThe tables within the Processed data tab of the Data page will also be populated and the Data page will be available for review.
Stage 5 - Calculate indices
More info
This stage comprises of the following steps:
Stage 6 - QAQC
More info
This stage comprises of the following steps:
The QAQC figures of the QAQC page will also be populated.
Stage 7 - Bootstrapping
More info
This stage comprises of the following steps:
Stage 8 - Summaries
More info
This stage comprises of the following steps:
The trend and effects figures of the Summaries page will also be populated.
Underneath the sidebar menu there are a series of buttons that control progression through the analysis pipeline stages. When a button is blue (and has a play icon), it indicates that the Stage is the next Stage to be run in the pipeline. Once a stage has run, the button will turn green. Grey buttons are disabled.
Clicking on button will run that stage (or stages in some cases). While the stage is in progress, a popup will be displayed over the buttons. This popup serves two purposes. Firstly, some tasks within some stages are computationally intense and thus take some time to perform. For such tasks, a progress bar will be displayed in the popup to inform you of the progress through this task. Secondly, as some stages/tasks are slow, it provides visual feedback about when a stage has truly started and completed and prevents the temptation to repeatedly click on a button when nothing appears to be happening.
Once a stage is complete, the button will change to either green (success), yellow (orange) or red (failures) indicating whether errors/warnings were encountered or not. If the stage was completed successfully, the button corresponding to the next available stage will be activated.
Sidebar menu items that are in orange font are active and clicking on an active menu item will reveal an associated page. Inactive menu items are in grey font. Menu items will only become active once the appropriate run stage has been met. The following table lists the events that activate a menu item.
| Menu Item | Trigger Event |
|---|---|
| Landing | Always active |
| Dashboard | Always active |
| Data | After Stage 4 |
| QAQC | After Stage 6 |
| Summaries | After Stage 8 |
| Manual | Always active |
Note, it is also possible to make all menu and buttons active using the Run in sequence toggle, however this should only be used if the full sequence of stages has already been run and you are returning to the analyses in a later session
Figure 2 provides a schematic overview of the sequence of filesystem events that occur during the development, deployment and running of this app.
deploy_wq.bat (or deploy_wq.sh) script, it will check whether docker is running and get input from the user about the location of the project directory.port 3838 of the localhost and this will be offered to the default browser.Before starting the application, it is important that you review the following files:
../parameters/config.ini. Specifically, review the following fields:
focal_year=: indicates the year that will be considered the focal sampling year. All outputs will be stored in a folder reflecting this display diagnostics for this the focal year is the most recent sampling year. Furthermore, many of the QAQC diagnostics pertain specifically to this focal year alone.method=: indicates the index method to employ (see ?@sec-index-methods for more information).foldcap=: indicates the value (on the fractional/fold scale) to cap indices (see ?@sec-index-foldcap for more information)tuning=: indicates the tuning value used in specific index calculations (see ?@sec-index-tuning for more information)size=: indicates the number of bootstrapp aggregations to useseed: indicates the random seed to use during any stoichastic processstart_date: the minimum date for analysed data. This allows the lower bound year of the data to be restricted (to omit earlier data if necessary). If this item is missing, the start_date will be determined from the observed data.end_date: the maximum date for the analysed data. This allows the upper bound year of the data to be restricted (to omit later data if necessary). If this item is missing, the end_date will be determined from the observed data.../parameters/water_quality_guidelines.csv. This file defines the guideline values associated with each measure as well as the labels to be used in tables/figures.../parameters/spatial.csv. This file defines the names used for Zones and Areas.../input/overwrites.csv. This file defines any expert overwrites that should be applied (in the event that the observed values are considered unrepresentative or unsuitable for some justifiable reason).../input/weights_m.csv. This file provides any specific weights that should be applied when aggregating items together on the Measure scale.../input/weights_s.csv. This file provides any specific weights that should be applied when aggregating items together on the spatial scale.Before describing the sequence of steps in the analysis pipeline, it is necessary to outline the major conceptual phases of the analyses.
Water Quality indices (which are standardized measures of condition) are typically expressed relative to a guideline (see Appendix (sec:guidelines?) on page (sec:guidelines?)) or benchmark. Of the numerous calculation methods available, those that take into account the distance from the guideline (i.e. incorporate difference-to-reference) rather than simply an indication of whether or not a guideline value has been exceeded are likely to retain more information as well as being less sensitive to small changes in condition close to the guidelines.
The challenging aspect of distance (or amplitude) based index methodologies is that what constitutes a large deviation from a benchmark depends on the scale of the measure. For example, a deviation of 10 units might be considered relatively large of turbidity (NTU) or salinity (ppt), yet might be considered only minor for the Total Nitrogen (\(\mu g/L\)). In order to combine a range of such metrics together into a meaningful index, the individual scores must be expressed on a common scale. Whilst this is automatically the case for Binary compliance, it is not necessarily the case for distance based indices.
Table 1 describes and compares the formulations and response curves of the Binary compliance method as well as a number of amplitude (distance based) indexing methods.
The Modified Amplitude and Logistic Modified Amplitude are both based on a base 2 logarithm of the ratio of observed values to the associated be benchmark (see Table 1). This scale ensures that distances to the benchmark are symmetric (in that a doubling and halving equate to the same magnitude - yet apposing sign). Furthermore, the logarithmic transformation does provide some inbuilt capacity to accommodate log-normality (a common property of measured values).
By altering the sign of the exponent, the Modified Amplitude methods can facilitate stressors and responses for which a failure to comply with a benchmark would be either above or below the benchmark (e.g. NTU vs Secchi depth). Further modifications can be applied to accommodate measures in which the benchmark represents the ideal and deviations either above or below represent increasingly poorer conditions (e.g. pH and dissolved oxygen).
The raw Modified Amplitude scores are relatively insensitive to small fluctuations around a benchmarks and sensitivity increases exponentially with increasing distance to the benchmark. The resulting scores can take any value in the real line [\(-\infty, \infty\)] and hence are not bounded\footnote{Unbounded indices are difficult to aggregate, since items that have very large magnitude scores will have more influence on the aggregation than those items with scores of smaller magnitude. Furthermore, unbounded scores are difficult to convert into alphanumeric Grades. Consequently, the Scores need to be scaled before they can be converted to alphabetical grading scale. There are two broad approaches to scaling (see Table 1):
Capping and scaling: The \(log_2\) scale can be capped to a range representing either a constant extent of change (e.g. twice and half the benchmark - a cap factor of 2) or else use historical quantiles (10th and 90th percentiles) to define the upper and lower bounds to which to cap the scale. Note historical quantiles are unavailable for the current application. Thereafter, either can be scaled to the range [0,1] via a simple formula (see Table 1 III.Scaled).
Logistic Modified Amplitude: By expressing the scores on a logistic scale, the range of scores can be automatically scaled to range [0,1]. Moreover, this method allows the shape of the response curve to be customized for purpose. For example, the relative sensitivity to changes close or far from the benchmarks can be altered by a tuning parameter.
Rather than aggregating across sites before calculating indices, we would suggest that indices should be calculated at the site level. This is particularly important when different measures are measured at different sites. Spatial variability can be addressed via the use of a bootstrapping routine (see below). We would recommend that measurements collected throughout the reporting year be aggregated together into a single annual value. This is primarily because most water quality guidelines pertain specifically to annual averages rather than single time samples. Although it is possible to incorporate uncertainty due to temporal variability, the low sparse temporal frequency of sample collection is likely to yield uncertainty characteristics that will swamp the more interesting spatial sources of uncertainty.
A useful metric for comparing the sensitivity of one indexing method over another is to take some representative longitudinal data and calculate indices based on the actual data as well as data that introduces progressively more noise.
In the absence of longitudinal data, it is possible to compare final outcomes based on each indexing formulation. Appendix \(\ref{sec:indexMethods}\) on provides comparisons of three index formulations (Binary, fsMAMP: Fixed cap Scaled Modified Amplitude and lMAMP: Logistic Modified Amplitude) for a range of spatial and Measurement hierarchical levels. It is not possible to offer quantile based scaling for the Modified Amplitude methods as long-term quantiles associated with the guideline values are not available.
| Method | Formulation | Response curve |
|---|---|---|
| Binary | \[ score_i = \left\{ \begin{array}{l l} 1 & \text{if}~x_i \le benchmark_i\\ 0 & \text{if}~x_i~\text{else}\\ \end{array} \right. \] | |
| Benchmark and WCS | \[ score_i = \left\{ \begin{array}{l l} 100 & \text{if}~x_i \le benchmark_i\\ 0 & \text{if}~x_i~\ge WCS_i\\ \left[1.0 - \lvert\frac{x_i - benchmark_i}{WCS_i - benchmark_i}\lvert\right].100 & \text{else}\\ \end{array} \right. \] | |
| Modified Amplitude | ||
| I. Raw (MAMP) | \[ score_i = \left\{ \begin{array}{l l} log_2(\frac{x_i}{benchmark_i})^{-1} & \text{if}>benchmark_i=\text{fail}\\ log_2(\frac{x_i}{benchmark_i})^{1} & \text{if}<benchmark_i=\text{fail}\\ \end{array} \right. \] | |
| II. Fixed caps (-1, 1) | \[ Score_i = \left\{ \begin{array}{l l} log_2(-1) & \text{if}~Score_i < -1\\ log_2(1) & \text{if}~Score_i > 1\\ Score_i &otherwise\\ \end{array} \right. \] | |
| II. Quantile based caps | \[ Score_i = \left\{ \begin{array}{l l} log_2(\frac{Q1}{benchmark_i})^{-1} & \text{if}~x_i<Q1\\ log_2(\frac{Q2}{benchmark_i})^{1} & \text{if}~x_i>Q2\\ Score_i &otherwise\\ \end{array} \right. \] | |
| III. Scaled | \[ Score_i = \frac{Score_i - min(Score_i)}{max(Score_i) - min(Score_i)} \] | |
| Logistic Scaled Modified Amplitude | \[ \lambda_i = \begin{cases} -1 & \text{if}>benchmark_i=\text{fail}\\ 1 & \text{if}<benchmark_i=\text{fail} \end{cases}; R_i = \begin{cases} -1 & \text{if}>benchmark_i=\text{fail}\\ 1 & \text{if}<benchmark_i=\text{fail} \end{cases};\newline Score_i=\frac{1}{1+e^{\lambda R_i T}} \] |
Bootstrapping is a statistical resampling method that involves repeatedly sampling with replacement from a dataset to estimate the sampling distribution of a statistic. By generating multiple “bootstrap samples,” this technique allows for the computation of measures such as confidence intervals, standard errors, and other uncertainty metrics without relying on strong parametric assumptions.
Bootstrapping is particularly useful for aggregating collections of data while retaining the combined uncertainty through the aggregation process. When data from multiple sources or experiments are combined, the uncertainty associated with each dataset must be accounted for to avoid underestimating the variability in the aggregated result. Bootstrapping achieves this by:
In summary, bootstrapping is a powerful tool for combining data while retaining all sources of uncertainty, ensuring that the aggregated results accurately reflect the variability inherent in the original datasets.
To facilitate the integration of additional input Measures into the report card scores (such as additional Physico-chem or nutrients), or even additional Sub-indicators (such as sediment metals, seagrasses and mangroves), we can defined a hierarchical structure in which Measures (such as Turbidity, NOx, etc) are nested within appropriate Sub-indicators. In turn, these Sub-indicators are nested within Indicators.
By progressively abstracting away the details of the Measures and Sub-indicators, a more focused narrative can be formulated around each level of the hierarchy. For example, when discussing the current state (and trend in state) of the Water Quality Indicator, rather than needing to discuss each individual constituent of Water Quality, high-level Grades are available on which to base high-level interpretations. More detailed explorations are thence revealed as required by exploring the Grades at progressively finer scales of the hierarchy. Moreover, the hierarchical structure offers great redundancy and thus flexibility to add, remove and exchange individual measures.
Similar arguments can be made for a spatial hierarchy in which Zones are nested within Regions which in turn are nested within the Whole Harbour.
The current hierarchical structure comprises two physico-chemical Measures (Dissolved Oxygen and Turbidity) that are nested within a single Physico-chem Sub-indicator and four nutrient Measures (Ammonia, Chlorophyll-a, Filterable Reactive Phosphate and NOx) that are nested within a single Nutrient Sub-indicator.
There is future scope to add additional nutrient or physico-chemical Measures as well as metals and sediments. This hierarchy can be extended down to include other environmental indicators pertaining to habitats (mangroves, seagrass, corals) or other perspectives (cultural and economic).
The purpose of aggregation is to combine together multiple items of data. The current proposed Darwin Harbour report card is informed by a double hierarchical data structure in which Measures are nested within Sub-indicators which are nested in Indicators and Zones are nested within Regions which are nested within Whole of Harbour (see Figure 3).
Although the double hierarchy (Spatial and Measurement), does offer substantial redundancy and power advantages, it also introduce the complexity of how to combine the hierarchies into a single hierarchical aggregation schedule. Table 2 (a fabricated example), illustrates this complexity for aggregating across Spatial and Measure scales when data availability differs. This simple example demonstrates how different aggregation schedules can result in different Whole of Harbour Component Scores:
| Region | Physico - chem | Nutrients | Indicator |
|---|---|---|---|
| 1 | 5.00 | 2 | 3.50 |
| 2 | 6.00 | 6.00 | |
| 3 | 6.00 | 4 | 5.00 |
| Whole of Harbour | 5.67 | 3 | X |
If X (mean) is calculated from the three row means = 4.83,
If X (mean) is calculated from the two column means = 4.33
Figure 4 illustrates the proposed aggregation schedule for Darwin Report Cards. Importantly, all Indicator Scores are generated from Zone level data and separate Zone and Whole of Harbour level Measure aggregations are maintained in parallel starting at the level of Indicator.
Important notes on this aggregation schedule:
To maximize information retention throughout a series of aggregations, it is preferable to aggregate distributions rather than single properties of those distributions (such as means). The simplest way to perform a hierarchy of aggregations is to interactively calculate the means (or median) of items (means of means etc). At each successive aggregation level only very basic distributional summaries (such as the mean and perhaps standard deviation) are retained, the bulk of upstream information is lost. Alternatively, more complex methods that involve combining data or probability distributions can be effective at aggregating data in a way that propagates rich distributional properties throughout a series of aggregations.
Importantly, if the purpose of aggregation is purely to establish a new point estimate of the combined items, a large variety of methods essentially yield the same outcomes. On the other hand, if the purpose of aggregation is also to propagate a measure of uncertainty or confidence in the point estimate through multiple hierarchical levels of aggregation (as is the case here), then the different methodologies offer differing degrees of flexibility and suitability.
Hierarchical aggregations are essentially a series of steps that sequentially combine distributions (which progressively become more data rich). The resulting distribution formed at each step should thereby reflect the general conditions typified by its parent distributions and by extension, each of the distributions higher up the hierarchy.
Numerous characteristics can be estimated from a distribution including the location (such as mean and median) and scale (such as variance and range). For the current project, the mean and variance were considered the most appropriate distributional descriptions and from these estimates Grades and measures of confidence can be respectively derived. Hence the numerical summaries (mean and variance) at any stage of the hierarchical aggregation are a byproduct rather than the sole property of propagation.
Although some of the items to be aggregated together might initially comprise only a few values (or even a single value), it is useful to conceptualize them as continuous distributions. For example, when aggregating multiple Measures (such as all Water Quality Metals) together to generate a Site level) _Sub_indicator average, each Measure in each Site can be considered a distribution comprising the single Score for that Measure. Aggregation then involves combining together the multiple distributions into a single amalgam (by adding the distributions together, see Figure Figure 5). Similarly, when aggregating at the Indicator level across Site to generate Zone summaries for each Indicator, Site distributions are respectively added together to yield a single distribution per Zone.
If the distributions being aggregated are all proportional distributions (e.g.~density distributions), adding them altogether is trivially simple. However, if, rather than actual distributions, the items to be aggregated are actually just small collections of values (as is the case for many of the measures here) or even large, yet unequally populous collections of values (as could be the case for Continuous Flow Monitoring with missing or suspect observations), then simply aggregating the distributions together will result in amalgams that are weighted according to the size of the collections (larger collections will have more influence). For example, if we were aggregating together three (to yield Whole of Harbour estimates), one of which comprised twice as many Zones, simple aggregation of distributions would result in a distribution that was more highly influenced by the Region with the more . Similarly, when aggregating from the level of Sub-indicator to the level of Indicator, the resulting Indicator would be biased towards the Sub-indicator with the most Measures. Whilst this may well be a useful property (e.g.~stratified aggregation), it may also be undesirable.
Bootstrapping is a simulation process that involves repeated sampling (in this case with replacement) of a sample set with the aim of generating a bootstrap sample from a distribution. This bootstrap sample can be used to estimate the underlying probability distribution function that generated the data as well as any other summary statistics. Importantly, bootstrapping provides a way to generate distributions that are proportional and thus un-weighted by the original sample sizes thereby facilitating un-weighted aggregation. Bootstrapped distributions can be aggregated (added together) to yield accumulated child distributions that retain the combined properties of both parents (see Figure 5). As a stochastic process, repeated calculations will yield slightly different outcomes. Nevertheless, the more bootstrap samples are collected, the greater the bootstrap distributions will reflect the underlying Score distribution and provided the number of drawn samples is sufficiently large (e.g. 10,000 re-samples), repeated outcomes will converge.
Standard bootstrapping yields equally weighted distributions, however, specific weighting schemes can also be easily applied by bootstrapping in proportion to the weights. For example, to weight one parent twice as high as another, simply collect twice as many re-samples from the first distribution. To ensure that all resulting distributions have the same size (by default 10,000 items), the number of bootstrap samples collected (\(n\)) from each of the (\(p\)) parent distributions (\(i\)), given the weights (\(w_i\)) is calculated as:
\[ n_i = \lceil(S/p)\times w_i \rceil \]
where \(S\) is the target size (10,000) and \(\lceil.\rceil\) indicates the ceiling. Qualitative data (such as ratings) can also be incorporated by enumerating the categories before bootstrapping.
In addition to allowing expert driven weights that govern the contribution of different items during aggregations, it is possible to weight according to relative spatial areas during spatial aggregations. Currently, all Zones are equally weighted when aggregating to Region level and all Regions equal when aggregating to Whole of Harbour level. That means that small Zones have an equal contribution as large Zones despite representing a smaller fraction of the water body. Area based weights could be applied such that Zones and Regions contribute in proportion to relative areas.
Weights are defined by a user editable configuration file that is similar in structure to the Water Quality guidelines file.
The ability for experts and Report Card managers to intervene (exclude or overwrite) Scores/Grades at any Spatial/Measure scale is essential to maintain the quality of a Report Card in the event of unrepresentative or suspect data. The current system is able to support expert interventions in the form of exclusions and overwrites. For example, after reviewing the QAQC, an expert can elect to exclude one or more Measures (or Subindicators etc) from one or more spatial scales. Such interventions are specified via a user editable configuration files (csv) that is similar in structure to the Water Quality guidelines file.
The essential component of this configuration file is that it allows a user to specify what Data are to be excluded or replaced. These can be at any of the levels of the Measure hierarchy (Measures, Sub-indications and Indicators) and any level of the Spatial hierarchy (Zones, Regions and Whole of Harbour). Settings pertaining to levels further along the aggregation hierarchies have precedence. For example, if Nutrients are excluded (or overridden) in a particular Region, then all Nutrient Measures within all Zones will be excluded irrespective of what the settings are for any specific Measure/Zone.
To reiterate, the advantage of bootstrapping data before concatenating (or averaging) versus simply concatenating data from multiple sources together, is to ensure that source data are all of exactly the same sample size (so as to not weight more heavily towards the more populous source(s)). Bootstrapping also provides a mechanism for propagating all distribution information throughout an aggregation hierarchy and ensures that estimates of variance derived from child distributions are on a consistent scale. The latter point is absolutely critical if variance is going to be used to inform a Confidence Rating system and confidence intervals.
Minimum operator procedures are supported by filtering on the lowest performed indicator prior to bootstrapping. Importantly, the bootstrapping routine simply provides a mechanism to collate all sources together to yield a super distribution. Thereafter, the joint distribution can be summarized in what ever manner is deemed appropriate (arithmetic, geometric, harmonic means, medians, variance, range, quantiles etc). Moreover, different levels of the aggregation can be summarized with different statistics if appropriate.
The double hierarchy Bootstrap aggregation described above, yields Score distributions for each Measure-level/Spatial-level combination. The location and scale of each distribution can thus be described by its mean and variance. Mean Score are then converted into a simple five-point alphanumeric Grade scale (and associated colors) using a control chart (see Table (tab-control?)).
| Score | Grade | Description |
|---|---|---|
| \(\ge 0.85\) | A | Very good |
| \(\ge 0.65, <0.85\) | B | Good |
| \(\ge 0.5, <0.65\) | C | Satisfactory |
| \(\ge 0.25, <0.5\) | D | Poor |
| \(<0.25\) | E | Very poor |
The control chart grade boundaries adopted for the current report (presented in (tab-control?)) are consistent with the Gladstone Healthy Harbour Partnership Report Card. Broadly, they represent two levels (Poor and Very Poor) under the Guideline values and three above (Satisfactory, Good and Very Good). The grade boundaries are usually determined by expert panel to ensure that the range of indices represented by each grade classification is congruent with community interpretation of a letter grade report cards. It is far less clear how estimates of uncertainty can be incorporated into such a grading scheme in a manner that will be intuitive to non-technical audiences. That said, statistical uncertainty is just one of many sources of un- certainty that should be captured into a confidence or certainty rating. Hence any expectations of presenting uncertainty in a quantitative manner may well be unrealistic anyway.
Incorporating an estimate of scale (variance) into a certainty or confidence rating necessitates re-scaling the estimates into a standard scale. In particular, whereas a scale parameter of high magnitude indicates lower degrees of certainty, for a certainty rating to be useful for end users, larger numbers should probably represent higher degrees of certainty. Thus, the scaling process should also reverse the scale. Furthermore, variance is dependent on the magnitude of the values.
In order to re-scale a scale estimate into a certainty rating, it is necessary to establish the range of values possible for the scale estimate. Whilst the minimum is simple enough (it will typically be 0), determining the maximum is a little more challenging depending on the aggregation algorithm (bootstrapping, Bayesian Network etc). One of the advantages in utilizing proportional distributions (such as is the case for a Bayesian Network or a re-sampled bootstrap distribution) is that the scale parameter for the single worst case scenario can be devised (once the worst case scenario has been determined) independent of sample sizes or weightings. In most situations this is going to be when the distribution comprises equal mass at (and only at) each of the two extremes (for example, values of just 0 and 1).
The measure of confidence rating discussed above is purely an objective metric derived from the variance in the aggregation hierarchy. It is completely naive to issues such as missing data, outliers and Limit of Detection issues - the influences of which on a confidence rating are necessarily subjective. A full Confidence Rating would combine these objective variance component with additional subjective considerations such as climatic and disturbance information, and the perceived influence of missing, Limit of Detection and outlying data. Hence, the statistical scaled statistical variance would form just one component in the Confidence Rating system.
The bootstrap aggregation method provides a mechanism for estimating variance from which to build such an expert considered Confidence Rating system.
Confidence intervals (CI) represent the intervals in which we have a certain degree of confidence (e.g. 95%) that repeated estimates will fall. Hence the 95% CI of the mean is the range defined by the quantiles representing 95% of repeated estimates of the mean.
To calculate 95% confidence intervals for bootstrap aggregated distributions (e.g. Zone 1/Nutrient distribution), we repeatedly draw a single sample from each of the constituent distributions (e.g. a single value from the Zone 1 Ammonia, Chlorophyll-a, Filterable Reactive Phosphate and NOx distributions) and from each set of draws, calculate the weighted mean of the values. The 95% CI is thus calculated as the quantiles (p=0.025 and p=0.975) of the means.
A. Calculation of Zone level Scores and Grades
Collect raw data (= Measures) at each fixed monitoring site or along a Continuous Flow Monitoring (CFM) transect and compare individual observations to associated guideline/benchmark/reference
Create indexed data as an expression of degree of difference scaled modified amplitude method) to yield a Score for each Measure per sampling location (e.g. site or location along CFM transect) (applies to Measures in all Indicators, Water Quality)
Apply any expert opinion interventions
Combine Measure Scores into Zone-level Sub-indicator Scores by averaging taking into account any weightings, i.e. aggregate into observation-level Sub-indicator Scores. This step involves Bootstrapping each input to distributions of 10,000 re-samples (or fewer if weighted), combining distributions and finally Bootstrapping again into a single 10,000 size distribution.
Combine Sub-indicator Scores into Zone-level Indicator Scores by averaging, i.e. aggregate into Zone-level Indicator Scores.
Convert Scores into coloured Grades (A-E) for visual presentation in report card
B. Calculation of Region level Grades
Aggregate Zone-level _Sub-indicator** Scores from step A.5 into Region-level Sub-Indicator Scores by averaging (incorporating spatial weights)
Aggregate Region-level Sub-indicator Scores into Region-level Indicator Scores by averaging (incorporating weights)
C. Calculation of Whole Of Harbour Grades
Aggregate Region-level Sub-Indicator Scores from step B.1 into Whole of Harbour-level Sub-Indicator Scores by averaging (incorporating spatial weights)
Aggregate Whole of Harbour-level Sub-Indicator Scores into Whole of Harbour-level Indicator Scores by averaging (incorporating weights)
A key goal of any monitoring program is to be able to report on status and trends. The above methodologies provide the tools for reporting on the status in any given year. What is also required is the ability to compare the status between years - that is to report on the changes in status over time.
The bootstrapp aggregations yield full empirical distributions for each Measure/Space unit per year. Typically we summarise these distributions via a mean and confidence interval to simplify their presentation. Then, if we simply wanted to calculate the change in a score between two years, we could simply subtract the mean of one year from the other. However, doing so would loose the information we had on uncertainty. If on the other hand, we subtracted the distribution for one year from that of another year, we would obtain a distribution of change values from which we could provide summaries like mean and confidence intervals.
Since each of the values in the distributions are produced via bootstrapping, they are not in any specific order (that is they are not ranked from lowest to highest value) and there are the same number (\(n\)) of bootstrapp samples in each distribution. Hence, if we line up the values of two distributions side by side, we can calculate the difference between each pair of values down the line. This will yield \(n\) differences. Collectively, these \(n\) differences now represent the distribution of change values.
In addition to summarising the distribution of change by properties such as mean and confidence interval, we can also estimate the probability that the change exceeds a certain value (typically zero). This calculation is simply the number of values that exceed (zero) divided by the total number of values and hereafter will be referred to as an exceedance probability (\(P_E\)).
Between two particular years, scores might generally increase, decrease or stay the same. To explore an increase, we would report on the probability that the change values were positive (\(P_E>0\)) and to report on a decline we would focus on the probability that the change values were negative (\(P_E<0\)). Note, we cannot report on the probability that the change values are zero (logically this is infinitely small). Herein, these exceedance probabilities are interpreted according to:
Within this application, change is explored two ways:
These changes can then by presented graphically and in tabular form.
At the completion of this stage, the Data sidebar menu and Stage 3 & 4 button will become active and the Data page will be populated with the raw data and be available for review.
This task will read in the water quality guidelines information from the
../parameters/water_quality_guidelines.csv file.
This task will sequentially read in each of the water quality data files from ../input/. These files are in the format of <number>_wq.csv, where <number> is a two digit number representation of the sampling year.
This task will read in each of the overwrites file from ../input/overwrites.csv. The purpose of overwrites is to provide a mechanism for expert elicitation, whereby grades can be overwritten in the event that the expert strongly believes that the observed data are a poor reflection of the genuine health of the sampling unit.
There are two tasks to read read in the measures weights file from ../input/weights_m.csv and the spatial weights file from ../input/weights_s.csv. These files provide a mechanism to control how much weight different items have in an aggregation. If no weights are given, then all items are equally weighted during aggregation.
This task will read in the aggregation hierarchy file from ../input/hierarchy.csv. The aggregation hierarchy determines the child items for each aggregation.
This task will read in the spatial settings file from ../parameters/spatial.csv. These data form a lookup that relate zone and area numbers to their names and provides some settings for the location and colour of labels on maps.
This task performs data validation in accordance with the rules set out in the following section.
To be valid, input data must be flat text files (*.csv) comprising:
../input/*_wq.csv
| Field | Description | Validation conditions |
|---|---|---|
| Zone | Spatial zone | must be numeric or factor |
| Region | Spatial region | must be numeric or factor |
| Source | Source of samples (CFM or Discrete) | must be character or factor of either (CFM or Discrete) |
| Date | Sample data | must be a valid string |
| Latitude | Latitude of sample | must be a numeric of format -d.d |
| Longitude | Longitude of sample | must be a numeric of format d.d |
| Chla_mug_PER_L | Chlorophyll-a concentration | must contain only numbers or start with a ‘<’ symbol |
| Turbidity_NTU | Turbidity concentration | must contain only numbers or start with a ‘<’ symbol |
| Turbidity_NTU | should not exist for Discrete source | |
| DO_PERCENT_saturation | Percentage dissolved oxygen | must contain only numbers or start with a ‘<’ symbol |
| NH3_mug_PER_L | Ammonium concentration | must contain only numbers or start with a ‘<’ symbol |
| NH3_mug_PER_L | should not exist for CFM source | |
| PO4_mug_PER_L | Phosphate concentration | must contain only numbers or start with a ‘<’ symbol |
| PO4_mug_PER_L | should not exist for CFM source | |
| Nox_mug_PER_L | Nox (nitrate and nitrite) concentration | must contain only numbers or start with a ‘<’ symbol |
| Nox_mug_PER_L | should not exist for CFM source |
../input/hierarchy.csv
| Field | Description | Validation conditions |
|---|---|---|
| Component | Highest level of measure hierarchy (always Environmental) | must contain characters |
| IndicatorGroup | Next measure level (always Water Quality) | must contain characters |
| Indicator | Next measure level (always Water Quality) | must contain characters |
| Subindicator | Either Nutrients or Physico-chem | must contain characters |
| Measure | Name of the Measure | must contain characters |
../input/weights_*.csv
| Field | Description | Validation conditions |
|---|---|---|
| Component | Highest level of measure hierarchy (always Environmental) | must contain characters |
| IndicatorGroup | Next measure level (always Water Quality) | must contain characters |
| Indicator | Next measure level (always Water Quality) | must contain characters |
| Subindicator | Either Nutrients or Physico-chem | must contain characters |
| Measure | Name of the Measure | must contain characters |
| Zone | Spatial zone | must be numeric or factor |
| Region | Spatial region | must be numeric or factor |
| Site | Spatial site | must be numeric or factor |
| Weight | Spatial site | must be numeric |
../input/overwrites.csv
| Field | Description | Validation conditions |
|---|---|---|
| Component | Highest level of measure hierarchy (always Environmental) | must contain characters |
| IndicatorGroup | Next measure level (always Water Quality) | must contain characters |
| Indicator | Next measure level (always Water Quality) | must contain characters |
| Subindicator | Either Nutrients or Physico-chem | must contain characters |
| Measure | Name of the Measure | must contain characters |
| Source | Source of samples (CFM or Discrete) | must be character or factor of either (CFM or Discrete) |
| Region | Spatial region | must be numeric or factor |
| Zone | Spatial zone | must be numeric or factor |
| Site | Spatial site | must be numeric or factor |
| overwrittenGrade | Grade to apply (overwrite observations) | must contain characters (A, B, C, D or E) |
../parameters/water_quality_guidelines.csv
| Field | Description | Validation conditions |
|---|---|---|
| ZoneName | Zone name | must contain characters |
| HydstraName | Name in hydstra | must contain characters |
| Conversion | Unit conversion factor | must be numeric |
| Measure | Name of the Measure | must contain characters |
| UnitsLabel | Name of the Measure including units | must contain characters |
| Label | Name of the Measure including units (formatted for LaTeX) | must contain characters |
| DirectionOfFailure | Direction of failure relative to guideline value | must be a single character (B or H) |
| GL | Water quality guideline value | must contain only numbers or start with a ‘<’ symbol |
| RangeFrom | Water quality guideline lower limit of range (for DO) | must contain only numbers or start with a ‘<’ symbol |
| RangeTo | Water quality guideline upper limit of range range (for DO) | must contain only numbers or start with a ‘<’ symbol |
| DetectionLimit | Limit of detection value | must contain only numbers or start with a ‘<’ symbol |
../parameters/spatial.csv
| Field | Description | Validation conditions |
|---|---|---|
| Region | Spatial region | must be numeric or factor |
| RegionName | Region name | must contain characters |
| Zone | Spatial zone | must be numeric or factor |
| ZoneName | Zone name | must contain characters |
| Lab_lat | Latitude of zone label on maps | must be a numeric of format -d.d |
| Lab_long | Longitude of zone label on maps | must be a numeric of format d.d |
| HexColor | Hexidecimal colour code for zone labels on maps | must be a six digit hex code proceeded by a ‘#’ |
This task will read in individual shapefiles from ../parameters/GIS. The files are: - RCZ_rev24.shp - SBZone_upper.shp - Middle_Harbour_Upper.shp
This task will combine all shapefiles into a single shapefile
The water quality data are read in as separate csv files. Although technically they could have been consolidated into a single file before opening this application, maintaining separate files for each year does allow the user to isolate any validation issues to a single year (thereby making them easier to located and correct).
The current task will combine all the water quality data into a single data set.
Spreadsheets often display dates in different formats from what they actually store internally and what they export. It is always advisable when exporting dates from spreadsheets to export them as strings (words) rather than numbers (number of seconds past a reference date) as these are typically less ambiguous.
The current task will process the dates from strings into genuine date objects
It is possible to restrict the analyses to a narrower range of dates than those that appear in the observed data. For example, the observed data might include some samples from an otherwise incomplete sampling year, and it might be desirable to exclude these samples. Hence, this task determines whether a start_date or end_date have been defined in ../parameters/config.ini and if so, uses them to trim the temporal range of the data.
The tables within the Processed data tab of the Data page will also be populated and the Data page will be available for review.
In order to compute indices, it is necessary to compare observed values to associated guideline values. In the current context a Measure is an observable property of water quality (such as Chlorophyl-a, Turbidity, Nox etc) - it is the analyte that is measured. This current task will exclude any measures for which there are no guideline values as these cannot be analysed further.
The focal year represents the year that will be considered the focal sampling year for the analyses. All outputs will be stored in a folder reflecting this display diagnostics for this the focal year is the most recent sampling year. Furthermore, many of the QAQC diagnostics pertain specifically to this focal year alone.
Although it is often convenient to transcribe and store data in wide format (one in which each measure has its own column and each row represents a single sample collection), this format is illsuited for analyses and graphics in R. Analyses and graphics prefer data to be in long format. The following two tables highlight the differences between wide (top table) and long (bottom table) formats:
| Zone | Date | Source | Chla_mug_PER_L | Turbidity_NTU | Nox_mug_PER_L |
|---|---|---|---|---|---|
| 1 | 01/06/2016 | CFM | 1.338 | 2.68 | NA |
| 1 | 01/06/2016 | CFM | 1.376 | 2.63 | NA |
| 1 | 02/06/2016 | Discrete | 0.279 | NA | 0.013 |
| 1 | 02/06/2016 | Discrete | 0.178 | NA | 0.017 |
| Zone | Date | Source | Measure | Value |
|---|---|---|---|---|
| 1 | 01/06/2016 | CFM | Chla_mug_PER_L | 1.338 |
| 1 | 01/06/2016 | CFM | Chla_mug_PER_L | 1.376 |
| 1 | 02/06/2016 | Discrete | Chla_mug_PER_L | 0.279 |
| 1 | 02/06/2016 | Discrete | Chla_mug_PER_L | 0.178 |
| 1 | 01/06/2016 | CFM | Turbidity_NTU | 2.68 |
| 1 | 01/06/2016 | CFM | Turbidity_NTU | 2.63 |
| 1 | 02/06/2016 | Discrete | Turbidity_NTU | NA |
| 1 | 02/06/2016 | Discrete | Turbidity_NTU | NA |
| 1 | 01/06/2016 | CFM | Nox_mug_PER_L | NA |
| 1 | 01/06/2016 | CFM | Nox_mug_PER_L | NA |
| 1 | 02/06/2016 | Discrete | Nox_mug_PER_L | 0.013 |
| 1 | 02/06/2016 | Discrete | Nox_mug_PER_L | 0.017 |
This task will therefore pivot the data from wide to long format.
This task will join (merge) the guideline data with the water quality data based on the Zones.
This task will use the shapefiles to assign spatial information (Zones and Regions) based on the latitude and longitude of the samples.
This stage creates a lookup table that relates each of the spatial scales to one another. This lookup is used to inject the spatial information into the data and modelled derivatives after they are created and in so doing prevents the need to spatially join the data each time it is required.
In addition to defining the guideline values, the ../input/water_quality_guidelines.csv file defines any unit conversions that need to be applied. This task applies those unit conversions.
This task applies limit of detection rules. Specifically, for all measures other than dissolved oxygen, if the observed value is below the detection limit, then the observed value is replaced by the detection limit. In the case of dissolved oxygen, if the observed value is less than 50, the observed value is replaced by NA (a missing value). In either case, a flag is added if a limit of detection replacement is made.
The aggregation hierarchy determines what items are aggregated together during the bootstrapp aggregation process. This task joins the aggregation schedule into the data.
This task loads the processed data in preparation for indice calculations.
This current step prepares the Site/Year/Source level data for bootstrapp aggregation by repeatedly sampling these data (according to the nominated number of bootstrapp samples, \(n\), set in the ../parameters/config.ini file). In the case that only a single observation was collected at a specific site in a specific year for a specific source, then this observation will be repeated \(n\) times.
This task will load the processed data.
This task will load the indices ready for bootstrapping.
This task will determine what overwrites have been defined in the ../input/overwrites.csv file and make those specific changes to the indices prior to bootstrapping. Essentially, this task involves replacing the indices and grades calculated in Stage 5 (for the nominated measure/spatial scale) with those nominated the overwrites.
This task will determine what weights (measure or spatial) have been defined in ../input/weights_m.csv and ../input/weights_s.csv respectively and prepare for applying these weights during the appropriate aggregation phase.